高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于Fisher判别字典学习的说话人识别

王伟 韩纪庆 郑铁然 郑贵滨 陶耀

王伟, 韩纪庆, 郑铁然, 郑贵滨, 陶耀. 基于Fisher判别字典学习的说话人识别[J]. 电子与信息学报, 2016, 38(2): 367-372. doi: 10.11999/JEIT 150566
引用本文: 王伟, 韩纪庆, 郑铁然, 郑贵滨, 陶耀. 基于Fisher判别字典学习的说话人识别[J]. 电子与信息学报, 2016, 38(2): 367-372. doi: 10.11999/JEIT 150566
WANG Wei, HAN Jiqing, ZHENG Tieran, ZHENG Guibin, TAO Yao. Speaker Recognition Based on Fisher Discrimination Dictionary Learning[J]. Journal of Electronics & Information Technology, 2016, 38(2): 367-372. doi: 10.11999/JEIT 150566
Citation: WANG Wei, HAN Jiqing, ZHENG Tieran, ZHENG Guibin, TAO Yao. Speaker Recognition Based on Fisher Discrimination Dictionary Learning[J]. Journal of Electronics & Information Technology, 2016, 38(2): 367-372. doi: 10.11999/JEIT 150566

基于Fisher判别字典学习的说话人识别

doi: 10.11999/JEIT 150566
基金项目: 

国家自然科学基金(61071181, 61471145),国家自然科学基金重大研究计划 (91120303)

Speaker Recognition Based on Fisher Discrimination Dictionary Learning

Funds: 

The National Natural Science Foundation of China (61071181, 61471145), The Major Research Plan of the National Natural Science Foundation of China (91120303)

  • 摘要: 稀疏表示已成功应用于说话人识别领域。在稀疏表示中,构造好的字典起着重要的作用。该文将Fisher准则的结构化字典学习方法引入说话人识别系统。在判别字典的学习过程中,每一个字典对应一个类标签,因此同类别训练样本的重构误差较小。同时,保证训练样本的稀疏编码系数类内误差最小,类间误差最大。在NIST SRE 2003数据库上,实验结果表明该算法得到的等错误率是7.62%,基于余弦距离打分的i-vector的等错误率是6.7%。当两个系统融合后,得到的等错误率是5.07%。
  • CANDS E. Compressive sampling[C]. Proceedings of the 2nd International Congress of Mathematicians, Spain, 2006: 1433-1452.
    CANDS E J, ROMBERG J, and TAO T. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information[J]. IEEE Transactions on Information Theory, 2004, 52(2): 489-509.
    BARANIUK R. Compressive sensing[J]. IEEE Signal Processing Magazine, 2008, 56(4): 4-5.
    丁军, 刘宏伟, 王英华. 基于非负稀疏表示的SAR图像目标识别方法[J]. 电子与信息学报, 2014, 36(9): 2194-2200. doi: 10.3724/SP.J.1146.2013.01451.
    DING Jun, LIU Hongwei, and WANG Yinghua. SAR image target recognition based on non-negative sparse representation[J]. Journal of Electronics Information Technology, 2004, 36(9): 2194-2200. doi: 10.3724/SP.J.1146. 2013.01451.
    苏伍各, 王宏强, 邓彬, 等. 基于稀疏贝叶斯方法的脉间捷变频ISAR成像技术研究[J]. 电子与信息学报,2015, 37(1): 1-8. doi: 10.11999/JEIT.140315.
    SU Wuge, WANG Hongqiang, DENG Bin, et al. The interpulse frequency agility ISAR imaging technology based on sparse bayesian method[J]. Journal of Electronics Information Technology, 2015, 37(1): 1-8. doi: 10.11999/ JEIT.140315.
    HUANG K and AVIYENTE S. Sparse Representation for Signal Classification[M]. New York, MIT Press, 2006: 609-616.
    MALLAT S. A Wavelet Tour of Signal Processing[M]. Second Edition. New York, Academic Press, 1999: 506-513.
    CANDS E J and GUO F. New multiscale transforms, minimum total variation synthesis: Applications to edge-preserving image reconstruction[J]. Signal Processing, 2002, 82(2): 1519-1543.
    GABOR D. Theory of communication. Part 1: the analysis of information[J]. Journal of the Institution of Electrical Engineers-Part III: Radio and Communication Engineering, 1946, 93(26): 429-441.
    AHARON M, ELAD M, and BRUCKSTEIN A. The K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation[J]. IEEE Transactions on Signal Processing, 2006, 54(11): 4311-4322.
    MAIRAL J, BACH F, and PONCE J. Online dictionary learning for sparse coding[C]. Proceedings of the 26th Annual International Conference on Machine Learning, Canada, 2009: 689-696.
    WANG J, LU C, WANG M, et al. Robust face recognition via adaptive sparse representation[J]. IEEE Transactions on Cybernetics, 2014, 44(12): 2368-2378.
    KUA J M K, AMBIKAIRAJAH E, and EPPS J. Speaker verification using sparse representation classification[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Czech Republic, 2011: 4548-4551.
    LI M, ZHANG X, and YAN Y. Speaker verification using sparse representations on total variability i-vectors[C]. 12th Annual Conference of the International Speech Communication Association (Interspeech), Italy, 2011: 2729-2732.
    MAIRAL J, BACH F, and PONCE J. Discriminative learned dictionaries for local image analysis[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, 2008: 1-8.
    ZHANG Q and LI B. Discriminative K-SVD for dictionary learning in face recognition[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, 2010: 2691-2698.
    RAMIREZ I, SPRECHMANN P, and SAPIRO G. Classification and clustering via dictionary learning with structured incoherence and shared features[C]. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, 2010: 3501-3508.
    JIANG Z, LIN Z, and DAVIS L S. Label consistent K-SVD: learning a discriminative dictionary for recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(11): 2651-2664.
    MAIRAL J, PONCE J, and SAPIRO G. Supervised Dictionary Learning[M]. New York, MIT Press, 2009: 1033-1040.
    WANG Z, YANG J, NASRABADI N, et al. Look into sparse representation based classification: A margin-based perspective[C]. IEEE International Conference on Computer Vision (ICCV), Sydney, 2013: 759-769.
    YANG M, ZHANG L, FENG X C, et al. Sparse representation based fisher discrimination dictionary learning for image classification[J]. International Journal of Computer Vision, 2014, 109(3): 209-232.
    RAO W and MAK M W. Boosting the performance of i-vector based speaker verification via utterance partitioning [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(5): 1012-1022.
    LIU T T, KANG Kai, and GUAN S X. I-vector based text-independent speaker identification[C]. 11th World Congress on Intelligent Control and Automation (WCICA), Shenyang, 2014: 5420-5425.
    DEHAK N, KENNY P, and DEHAK R. Front-end factor analysis for speaker verification[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19 (4): 788-798.
    DEHAK N, KENNY P, and DEHAK R. Support vector machines and joint factor analysis for speaker verification[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Taiwan, 2009: 4237-4240.
    ROSASCO L, VERRI A, and SANTORO M. Iterative projection methods for structured sparsity regularization[R]. MIT Technical Reports, MIT-CSAIL-TR-2009-050, CBCL-282, 2009.
    GU S, ZHANG L, and ZUO W. Projective Dictionary Pair Learning for Pattern Classification[M]. New York, MIT Press, 2014: 793-801.
    KENNY P, STAFYLAKIS T, and OUELLET P. PLDA for speaker verification with utterances of arbitrary duration[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, 2013: 7649-7653.
    HARIS B C and SINHA R. Sparse representation over learned and discriminatively learned dictionaries for speaker verification[C]. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, 2012: 4785-4788.
    STAFYLAKIS T, KENNY P, and SENOUSSAOUI M. PLDA using gaussian restricted boltzmann machines with application to speaker verification[C]. 13th Annual Conference of the International Speech Communication Association (Interspeech), Portland, 2012: 1692-1695.
    KINNUNEN T and LI H. An overview of text-independent speaker recognition: from features to supervectors[J]. Speech Communication, 2010, 52(1): 12-40.
    KANAGASUNDARAM A, DEAN D, SRIDHARAN S, et al. I-vector based speaker recognition using advanced channel compensation techniques[J]. Computer Speech Language, 2014, 28(1): 121-140.
  • 加载中
计量
  • 文章访问数:  1058
  • HTML全文浏览量:  115
  • PDF下载量:  775
  • 被引次数: 0
出版历程
  • 收稿日期:  2015-05-13
  • 修回日期:  2015-09-06
  • 刊出日期:  2016-02-19

目录

    /

    返回文章
    返回